Skip to main content

Using the Internet Archive

The Archive’s purpose is to provide free access to the research and academic communities. It does not charge for access to the collections. Read about how we are using the collections.

Accessing the Collections

If the Wayback Machine does not provide the access you need and you would like secure telnet access to the Web or Usenet collections, use our form to send a proposal. Using the collections requires agreeing to our terms of use and privacy and copyright policies.

Technical Requirements

While the Internet Archive does not charge for access to its collections, you will need Unix programming skills to gain access to and use a collection of Web snapshots or Usenet postings. (The Wayback Machine provides free, easy-to-use access to individual Web pages.)

The diagram shows the architecture for storage of and access to the Web collections:

Architecture for Storage of and Access to the Internet Archive's Collections

The Archive assigns each user an ssh (secure shell) access account and disk space on the server facade.archive.org. (Secure shell access provides character-terminal log-in; it’s similar to Telnet access but more secure.) The server runs the Linux operating system.

The server facade.archive.org has access to a series of Linux machines (named ia000.archive.org, ia001.archive.org, and so on). Each machine has either 12 or 20 disk drives (named 0, 1, 2, and so on). On each drive are three types of files:

  • Files in ARC format, which each contain complete data from a number of files in the collection; see Alexa for more information on this format
  • Files in DAT or MDT format, which contain data such as URLs and image references from the ARC files (researchers can use these files to study link structure)
  • Files in IDX (index) format, which each contain a list of URLs and their associated place in the ARC and DAT files

Users access the hard drives where the collections reside by referencing these remote files from facade.archive.org. You can use either FTP or NFS (network file system) access.